Data-Driven Tree Transforms and Metrics
نویسندگان
چکیده
We consider the analysis of high dimensional data given in the form of a matrix with columns consisting of observations and rows consisting of features. Often the data is such that the observations do not reside on a regular grid, and the given order of the features is arbitrary and does not convey a notion of locality. Therefore, traditional transforms and metrics cannot be used for data organization and analysis. In this paper, our goal is to organize the data by defining an appropriate representation and metric such that they respect the smoothness and structure underlying the data. We also aim to generalize the joint clustering of observations and features in the case the data does not fall into clear disjoint groups. For this purpose, we propose multiscale data-driven transforms and metrics based on trees. Their construction is implemented in an iterative refinement procedure that exploits the co-dependencies between features and observations. Beyond the organization of a single dataset, our approach enables us to transfer the organization learned from one dataset to another and to integrate several datasets together. We present an application to breast cancer gene expression analysis: learning metrics on the genes to cluster the tumor samples into cancer sub-types and validating the joint organization of both the genes and the samples. We demonstrate that using our approach to combine information from multiple gene expression cohorts, acquired by different profiling technologies, improves the clustering of tumor samples.
منابع مشابه
Real-time quality monitoring in debutanizer column with regression tree and ANFIS
A debutanizer column is an integral part of any petroleum refinery. Online composition monitoring of debutanizer column outlet streams is highly desirable in order to maximize the production of liquefied petroleum gas. In this article, data-driven models for debutanizer column are developed for real-time composition monitoring. The dataset used has seven process variables as inputs and the outp...
متن کاملProper gromov transforms of metrics are metrics
In phylogenetic analysis, a standard problem is to approximate a given metric by an additive metric. Here it is shown that, given a metric D defined on some finite set X and a non-expansive map f : X → R, the one-parameter family of the Gromov transforms D of D relative to f and ∆ that starts with D for large values of ∆ and ends with an additive metric for ∆ = 0 consists exclusively of metrics...
متن کاملEstimation of Tree Biomass at Individual tree, Sample plot and Hybrid Level using Drone Images
Two-dimensional image conversion algorithms to 3D data create the hope that the structural properties of trees can be extracted through these images. In this study, the accuracy of biomass estimation in tree, plot, and hybrid levels using UAVs images was investigated. In 34.8 ha of Sisangan Forest Park, using a quadcopter, 854 images from an altitude of 100 meters above ground were acquired. SF...
متن کاملModeling the potential of Sand and Dust Storm sources formation using time series of remote sensing data, fuzzy logic and artificial neural network (A Case study of Euphrates basin)
Due to the differences between the visible and thermal infrared images, the combination of these two types of images leads to better understanding of the characteristics of targets and the environment. Thermal infrared images are really in distinguishing targets from the background based on the radiation differences and land surface temperature (LST) calculation. However, their spatial resolu...
متن کاملIntelligent identification of vehicle’s dynamics based on local model network
This paper proposes an intelligent approach for dynamic identification of the vehicles. The proposed approach is based on the data-driven identification and uses a high-performance local model network (LMN) for estimation of the vehicle’s longitudinal velocity, lateral acceleration and yaw rate. The proposed LMN requires no pre-defined standard vehicle model and uses measurement data to identif...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1708.05768 شماره
صفحات -
تاریخ انتشار 2017